XUXEN: A Spelling Checker/Corrector for Basque Based on Two-Level Morphology
نویسندگان
چکیده
The application of the formalism of two-level morphology to Basque and its use in the e laborat ion of the X U X E N s p e l l i n g checker/corrector are described. This application is intended to cover a large part of the language. Because Basque is a highly inflected language, the approach of spelling checking and correction has been conceived as a by-product of a general purpose morphological analyzer/generator. This analyzer is taken as a basic tool for current and future work on automatic processing of Basque. An extens ion for cont inuat ion c l a s s specifications in order to deal with long-distance dependencies is proposed. This extension consists basically of two features added to the standard formalism which allow the lexicon builder to make explicit the interdependencies of morphemes. User-lexicons can be interactively enriched with new entries enabling the checker from then on to recognize all the possible flexions derived from them. Due to a late process of standardization of the language, writers don't always know the standard form to be used and commit errors. The treatment of these "typical errors" is made in a specific way by means of describing them using the two-level lexicon system. In this sense, XUXEN is intended as a useful tool for standardization purposes of present day written Basque.
منابع مشابه
XUXEN: A Spelling Checker/Corrector for Basque Based on Two-Level Morphology
The application of the formalism of two-level morphology to Basque and its use in the elaboration of the XUXEN spell ing checker/corrector are described. This application is intended to cover a large part of the language. Because Basque is a highly inflected language, the approach of spelling checking and correction has been conceived as a by-product of a general purpose morphological analyzer/...
متن کاملA Morphological Analysis Based Method for Spelling Correction
Xuxen is a spelling checker/corrector for Basque which is going to be comercialized next year. The checker recognizes a word-form if a correct morphological breakdown is allowed. The morphological analysis is based on two-level morphology. The correction method distinguishes between orthographic errors and typographical errors. • Typographical errors (or misstypings) are uncognitive errors whic...
متن کاملDesigning spelling correctors for inflected languages using lexical transducers
This paper describes the components used in the design of the commercial X u x e n I I spelling checker/corrector for Basque. It is a new version of the Xuxen spelling corrector (Aduriz et al., 97) which uses lexical transducers to improve the process. A very important new feature is the use of user dictionaries whose entries can recognise both the original and inflected forms. In languages wit...
متن کاملSpelling Correction: from Two-Level Morphology to Open Source
Basque is a highly inflected and agglutinative language (Alegria et al., 1996). Two-level morphology has been applied successfully to this kind of languages and there are two-level based descriptions for very different languages. After doing the morphological description for a language, it is easy to develop a spelling checker/corrector for this language. However, what happens if we want to use...
متن کاملUsing Finite State Technology in Natural Language Processing of Basque
This paper describes the components used in the design and implementation of NLP tools for Basque. These components are based on finite state technology and are devoted to the morphological analysis of Basque, an agglutinative pre-Indo-European language. We think that our design can be interesting for the treatment of other languages. The main components developed are a general and robust morph...
متن کامل